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Abstract 

This paper argues that the ideas underlying the renormalization group technique used to 
characterize phase transitions in condensed matter systems could be useful for distinguishing 
computational complexity classes. The paper presents a renormalization group transformation 
that maps an arbitrary Boolean function of N Boolean variables to one of N — 1 variables. When 
this transformation is applied repeatedly, the behavior of the resulting sequence of functions is 
different for a generic Boolean function than for Boolean functions that can be written as a 
polynomial of degree £ with £ <C N as well as for functions that depend on composite variables 
such as the arithmetic sum of the inputs. Being able to demonstrate that functions are non- 
generic is of interest because it suggests an avenue for constructing an algorithm capable of 
demonstrating that a given Boolean function cannot be computed using resources that are 
bounded by a polynomial of N. 



1 Introduction 



Computational complexity characterizes how the computational resources to solve a problem de- 
pend on the size of the problem specification [1]. Two well-known complexity classes [2] are P, 
problems that can be solved with resources that scale polynomially with the problem size, and NP, 
the class of problems for which a solution can be verified with polynomial resources. Whether or 
not P is equal to NP [3, 4] is a great outstanding question in computational complexity theory and 
in mathematics generally [5-9]. 



In this paper it is argued that a method known in statistical physics as the renormalization group 
(RG) [10-13] may yield useful insight into the P versus NP question. This technique, originally 
formulated to provide insight into the nature of phase transitions in statistical mechanical sys- 
tems [11,12], involves taking a problem with N variables and then rewriting it as a problem 
involving fewer variables. Here, we will define a procedure by which a given Boolean function of 
N Boolean variables is used to generate a Boolean function of N — 1 variables, and investigate the 
properties of the resulting sequence of functions as this procedure is iterated [14]. The transforma- 
tion used here is very simple — the new function is one if the original function changes its output 
value when a given input variable's value is changed, and is zero if it does not. It is shown that when 



1 



this transformation is applied repeatedly, the behavior of the resulting sequence of functions can 
be used to distinguish generic Boolean functions from functions that are known to be computable 
using polynomially bounded resources. 



Any Boolean function f(x±, . . . ,xn) of the N Boolean variables x±, . . . ,xn can be written as a 
polynomial in the Xj using modulo-two addition. This follows because the variables and function 
all can be only or 1, so f(xi, . . . , xjy) can be written as 

f{xi,...,x N ) = 4oo...oo(l © si)(l © x 2 ) ... (1 © sjv-i)(1 ffi ijv) 
© A)o...oi(l © ari)(l © x 2 ) . . . (1 © x N -i)(x N ) 



-4n...io(^i)(a;2) • • • (zjv-i)(1 © x N ) 
^ii...n(a;i)(ar2) • • • {x N - 1 )(x N ) , 



(1) 



where A x1j ... jXjv = /(xi, . . . , xjv). As Shannon pointed out [15], the number of different possible 
functions is 2 2 (this follows because each of the 2^ coefficients A ai aN can be either one or zero). 
This is much larger than the number of functions that can be computed using resources that scale 
no faster than as a polynomial of N, which scales asymptotically as (CN) 1 , where C is a constant 
and t is a polynomial in N [16, 17]. This counting argument demonstrates that almost all functions 
cannot be evaluated using polynomially bounded resources and hence are not in P. However, it 
does not provide a means for determining whether or not a given function can be computed with 
polynomial resources. 



It is shown here that different classes of functions have different behavior upon repeated applica- 
tion of a renormalization group transformation. In analogy with well-known results in statistical 
mechanics [10], we interpret functions exhibiting different behaviors after many renormalizations 
as being in different phases. Generic Boolean functions exhibit simple "fixed point" behavior upon 
renormalization, and hence we claim that they comprise a phase. A function that can be written 
either as a low-order polynomial or as a function of a composite variable such as the arithmetic 
sum of the values of the inputs yields non-generic behavior upon renormalization, and so is in a 
non-generic phase. 

We then discuss what would be needed to be able to use the renormalization group approach to 
demonstrate that a given Boolean function of N variables cannot be evaluated with resources that 
are bounded above by a polynomial in N. This issue is relevant to the P versus NP question 
because if we can identify a function in NP that we can show is not in P, then we will have shown 
that P and NP are not equal. Some functions that are in P depend on the arithmetic sum of 
the inputs, including MAJORITY, which is one if more than half the inputs are nonzero and zero 
otherwise [18], and DIVISIBILITY MOD p, which is one if the sum of the inputs is divisible by 
an odd prime p [19, 20], and the renormalization group approach identifies these functions as non- 
generic. The renormalization group approach identifies low-order polynomials as non-generic, and 
some but not all low-order polynomials are in P. Because there are functions in P that are the sum 
of a low-order polynomial plus a small random component that is nonzero on a small fraction of 
the inputs, and because such functions will "flow" to the generic fixed point upon renormalization, 
P is not a phase in the statistical mechanical sense. Therefore, there are functions known to be 
in P that can be identified as non-generic only because they are close to a phase boundary in the 
sense that they differ from a low-order polynomial on a small fraction of the inputs. Thus, the 
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renormalization group approach provides a means for understanding why the P versus NP question 
is so difficult — showing that a function is not in P using the renormalization group approach 
requires determining not only that it is not in a non-generic phase but also that it is not near 
a phase boundary, a task that appears to require resources that grow faster than exponentially 
with N. This super exponential scaling means that the procedure proposed here cannot be used 
to break pseudorandom number generators, a difficulty that would arise if the procedure could be 
implemented with resources that scale no faster than exponentially with N [21]. However, at this 
point we cannot prove that a given function is not in P — our procedure distinguishes every function 
in P of which we are aware from a generic Boolean function, but we have not demonstrated that 
the procedure works for all functions that are in P. 

The paper is organized as follows. Sec. 2 presents the transformation that maps a Boolean function 
of N variables into a Boolean function of N — 1 variables. Repeatedly applying this transformation 
yields a sequence of functions, and in Sec. 2 it is shown that (1) if one starts with a generic random 
Boolean function, then the resulting sequence of functions has the property that all functions in 
it are nonzero for just about half the input configurations, (2) applying the RG transformation £ 
times to a function that is a polynomial of order less than £ yields zero, and (3) applying the RG 
transformation to functions that depend on a composite variable such as the sum of the values of all 
the inputs also yields a sequence of functions that differs from from the result for a generic Boolean 
function. In Sec. (3) it is shown that simply applying the RG transformation many times does not 
identify functions that can be written as the sum of a low-order polynomial plus a contribution that 
is nonzero on a small fraction of the inputs. One can identify functions of this type by examining 
the set of functions whose outputs differ from the original one on a small fraction of the input 
configurations — one of the functions in the set will be a low-order polynomial. Sec. 4 discusses the 
results in the framework of phase transitions in condensed matter systems, which renormalization 
group transformations are typically used to study, and also discusses how the strategy discussed 
here avoids the difficulties of "natural proofs" described in Ref. [21]. Sec. 5 presents the conclusions. 
Appendix A presents the arguments demonstrating why it is plausible most functions that can be 
computed with polynomially bounded resources can be written as a low-order polynomial plus a 
term that is nonzero for a fraction of input configurations that is exponentially small in Nf log(iV), 
and discusses the non-generic nature of the functions in P that do not have that property. Appendix 
B shows that a typical Boolean function cannot be written as a low-order polynomial plus a term 
that is exponentially small in N/ log (N). 



2 Renormalization group transformation 



The renormalization group (RG) procedure we define takes a given function of N variables and 
generates a function of N — 1 variables [10-14]. The variable that is eliminated is called the 
"decimated" variable. The procedure can be iterated, mapping a function of N — 1 variables into 
one of N — 2 variables, etc. 

The transformation studied here specifies whether the original function's value changes if a given 
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input variable is changed. Specifically, given a function f(x±, . . . ,xn) = f(x), we define 



9ii ( x li x li - ■ ■ ■> x h-li x ii+li ■ ■ ■ j x n) = 9i\ ( x ) 

= f(xi, x 2,---, x h-i,0, x il+ i, x N ) © f(xi,x 2 , Sii-i, 1, x il+ i, x N ) , (2) 

where © denotes addition modulo 2 [22], and the vector x! denotes the set of undecimated variables. 
The function g^ (x±,X2, ■ ■ ■ , a^-i, 2^+1, • • • , 2: at) is one if the output of the function / changes when 
the value of the decimated variable x^ is changed and zero if it does not. Once has been obtained, 
the procedure can be repeated and one can define as 

9i\,l2 O^l ) X \l • • • ) — 1 ) 2-11+1) • • • ) 2?i 2 — 1) 2-12 + 1 ) x n) = 9il,l2 i x ) 

= 9h{xi,X2, ■ ■ . ,Xi 2 -l,0,Xj 2+ l, . . . ,xn) 
© 9h ( x l) 2-2) • • • ) 2j 2 -l) 1) 2j 2 +l) • • • ) 2at) 

= f{ x li x 2i • • • ) ^jj — 1, 0, Xj-L-i-l, . . . , Xj 2 _l, 0, Xi 2 +li ■ ■ ■ ) 2;jv) 
© f( x l, x 2, ■ ■ ■ i 2^ii-l, 0, . . . , Xj 2 -1, 1, 2^2+1) ■ ■ ■ ) 2 at) 

© f(x±, X2, • • • , Xjj — l, 1, Xj-L-f-l, . . . , Xj 2 -1, 0, Xj 2 +1, • • • , 2;jv) 

© f(x±, X2, • • • , Xjj-i, 1, x^+i, . . . , Xi 2 — i, 1, Xj 2 +i, • • • , 2;jv) , (3) 

where the sums all denote addition modulo two. The function g Xi , Xim i x> ) obtained by decimating 
the m variables x% x , . . . , Xi m does not depend on the order in which the variables are decimated. 

First we examine functions for which each of the coefficients Aai,a 2 ,...,a N in Eq. (1) is an independent 
random variable chosen to be one with probability po and zero with probability qo = 1 — po, 
where < po < 1. We consider the sequence of functions obtained by successive application 
of the renormalization group transformation to such a generic random function. The coefficients 
■4xi,L,x il _i,xf 1+ i,...,xjv that characterize the function gi^x 1 ) obtained by decimating the variable i\ 
via Eq. (2) are 

-^Xl^..,Xj 1 _l,Xj 1 + l,...,Xjv = -^■Xl,...,Xi 1 _l,0,Xi 1 +l,...,Xjv © ^4xi,...,Xj 1 _i,l,x il+ i,...,xj V • (4) 

The original A^'s are uncorrelated random variables, so it follows that the A^^s are independent 
random variables that are one with probability p\ = 2po<?o an d zero with probability 1 — p\. After 
i iterations (after which I variables have been eliminated), the coefficients are still uncorrelated 
random variables, and they are now one with probability pi and zero with probability 1—pe, where 
the pi satisfy the recursion relation 



The solution to Eq. (5) is 



Pi 



Pe+i = 2p e (l -Pe) ■ (5) 



= \ (l - (1 - 2p f) . (6) 



For any pq satisfying < po < 1, the values of the p£ "flow" as i increases and eventually approach 
the "fixed-point value" of 1/2 [10]. This behavior is exactly analogous to that displayed by the 
partition functions describing thermodynamic phases in statistical mechanical systems, and so we 
interpret this behavior as evidence that there is a phase of generic Boolean functions. 
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In Sec. (3) we will be considering values of po that are very small but nonzero, for which case pe 
grows exponentially with b. 

Pi = 2 e p (when p e < 1) . (7) 

After many renormalizations such functions will "flow" to the generic fixed point, so they are in 
the generic phase. If one chooses po = P(N)2~ N , where P(N) is a polynomial in N, the function 
can be specified with polynomially bounded resources by enumerating all input configurations for 
which the function is nonzero. 

Note that when the RG transformation is applied to a generic Boolean function, all the functions 
that are generated yield an output that is zero on a fraction of the inputs that differs from 1/2 by an 
amount that is exponentially small in N. This follows because almost all Boolean functions have an 
initial value of po that differs from 1 /2 by an amount that is the square root of the number of values 
chosen, or (2 N ) 1 / 2 = 2 N I 2 . Since all the pn deviate from 1/2 by an amount that is exponentially 
small in N, and since the number of independent input configurations remains exponentially large 
in iV until the number of decimated variables is of order N, for every function obtained via the 
renormalization transformation, the fraction of input configurations yielding zero deviates from 1/2 
by an amount that is exponentially small in N. 

We next demonstrate that Boolean functions that can be written as polynomials of degree of £ or 
less when £ < N have the property that they yield zero after £ + 1 renormalizations, for any choice 
of the decimated variables. 

First we examine a simple example. The parity function V(xi, . . . , ccjv), which is 1 if an odd number 
of input variables are 1 and if an even number of the input variables are 1 [9, 23-25], can be written 

as 

V{xi,. . . ,x N ) = xi ffix 2 © • • • ®x N . (8) 

There are many less efficient ways to write the parity function, but the result of the renormalization 
procedure does not depend on how one has chosen to write the function, since it can be computed 
knowing only the values of the function for all different input configurations. For the parity function, 
one finds, for any choice of decimated variables Xj l and Xj 2 , the functions resulting from one and 
two renormalizations, of (x') and qf <x f ), are: 

g p h {x') = x h © (1 - x h ) = 1 ; 
gf. . (x') = . 

Thus, applying the renormalization transformation to the parity function yields zero after two 
iterations, in contrast to the behavior of a generic Boolean function. 

More generally, for any term of the form T = y^y^ ■ ■ ■ yi m , with yi = Xi or 1 — Xi, the quantity 
T(xi = 1) © T{x{ = 0) is either zero (if yi does not occur in T) or else is the product of m — 1 
instead of m of the y's; for example 

T{y n = 1)0% =0) =y l2 ...y lm . (9) 

Because the effect of the RG procedure on the sum of terms is equal to the sum of the results of 
the transformation applied to the individual terms, any function that is the mod-2 sum of terms 
that are all products of fewer than m j/'s will yield zero after m renormalizations, for any choice 
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of the decimated variables. It follows immediately that a function that is a polynomial of degree £ 
or less has the property that applying the RG transformation to it £ + 1 times yields zero for any 
choice of the decimated variables. 

This result demonstrates that the RG transformation distinguishes generic Boolean functions from 
functions that can be written as polynomials of degree £ or less, when £ < N. The qualitatively 
different behavior upon renormalization of polynomials of degree £ from generic Boolean functions 
can be interpreted as evidence that these two classes of functions are in different phases. 

We now demonstrate that the RG method also identifies as non-generic functions that depend 
on a composite quantity such as the arithmetic sum of the variables. Functions in P with this 
property include MAJORITY (which is one if more than half the inputs are set to one, and zero 
otherwise) [18] and DIVISIBILITY MOD p (which is one if the number of inputs that are set 
to one is divisible by an odd prime p and zero otherwise) [19,20]. The renormalization group 
approach distinguishes such functions from generic Boolean functions because the output of all the 
functions in the sequence is constrained to be identical for very large sets of input configurations. 
We first show that MAJORITY and DIVISIBILITY MOD p are both distinguished from a generic 
Boolean function by the renormalization group procedure, and then we argue that the RG procedure 
distinguishes any function of the arithmetic sum of the inputs from a generic Boolean function. We 
expect that the argument will be generalizable to apply to a broad class of functions that depend 
on other composite quantities that are specific combinations of the input variables. 

First we consider the behavior when the RG transformation is applied to DIVISIBILITY MOD 
3. Since this function is nonzero when the arithmetic sum X^jLi x j ^ s divisible by 3, changing an 
input Xi changes the output value when the sum of the other input variables is either zero or two. 
Thus, the renormalized function gi(x') is nonzero for any i on a fraction of the input configurations 
that is very close to 2/3. Every succeeding renormalization also yields a function that is nonzero 
when the sum of the remaining variables is either zero or two. This behavior differs from that of a 
generic Boolean function, in which the renormalized functions are nonzero for a fraction of inputs 
that is very close to 1/2. More generally, when the RG is applied to DIVISIBILITY MOD p, with 
p an odd prime, the behavior of the sequence of functions is determined by the value of the mod p 
remainder of the undecimated variables. The functions in the sequence yield the output one when 
the remainder mod p takes on certain values, and typically, after a small number of iterations, 
these values cycle with a finite period. Therefore, the fraction of input configurations that lead to 
a nonzero input essentially cycles also (the cycling is not exact only because the fraction of input 
configurations with a given value of the remainder mod p changes very slightly with N), and, since 
p is odd, none of the fractions in the cycle is close to 1/2. 

The behavior obtained when the RG procedure is applied to the MAJORITY function is also 
significantly different from that of a generic Boolean function. The first renormalization step 
yields a function that is nonzero when the sum of the undecimated variables is N/2 — 1, and the 
second step yields a function that is nonzero when the sum of the undecimated variables is either 
N/2 — 2 or N/2 — 1. The functions obtained after j decimations are nonzero on a fraction of inputs 
that is bounded above by Cj/y/N, where C is a constant of order unity, so long as j <C \fN. The 
original function is thus identified as non-generic because so long as the number of renormalizations 
applied is much smaller than \fN the renormalized functions are all nonzero on a fraction of input 
configurations that is much less than 1/2. 
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Next we argue that the renormalization group approach distinguishes any function of the arithmetic 
sum of the inputs from a generic Boolean function. The physical intuition underlying the argument 
is that all the functions in the sequence depend only on the arithmetic sum of the undecimated 
variables, and when the number of undecimated variables is M, the number of configurations of the 
undecimated variables whose arithmetic sum is constrained to be S, is Ml/S\(M — S)l. One can 
use Stirling's series [28] to show explicitly that when iV is large, then the number of configurations 
with a given value of S is a polynomial in 1/N times 2^ for a number of values of S that grows as 
the square root of N. Therefore, the differences in the fraction of configurations yielding different 
values of N decay polynomially with N, and the fraction of input configurations yielding one should 
either be exactly 1/2 or else must deviate from 1/2 by an amount that decreases only polynomially 
with N. 



3 Renormalization procedure for characterizing functions that can 
be constructed using polynomially bounded resources. 

This section addresses the relationship between non-generic phases of Boolean functions and the 
computational complexity class P of functions that can be computed with polynomially bounded 
resources. 

There are functions that are in P that are neither polynomials of degree £ with £ < N nor functions 
of composite variables. For example, because the sum of two functions that are in P is in P, 
a sum of any function that is in P with a small "generic" piece specified by Eq. (1) with the 
coefficients chosen independently and randomly to be one with probability po = V(N)2~ N , where 
V{N) is a polynomial in N, is in P. Eq. (7) shows that I renormalizations cause the value of pe 
to grow exponentially with £, pi = 2 e po; in renormalization group parlance [10] the remainder is 
a "relevant" perturbation. Since the generic piece renormalizes towards the generic fixed point at 
which exponentially close to half the inputs yield a nonzero output, whether or not the function 
resulting from many renormalizations can be identified as non-generic depends on whether the first 
piece yields a nonzero result after many renormalizations. A function of a composite variable yields 
a result different both from zero and from that of generic functions, and when a small generic piece 
is added to such a function, renormalization still yields a non-generic result. However, because after 
£ + 1 renormalizations of a polynomial of order £ one obtains zero, renormalizing functions that 
are the sum of a low-order polynomial and a small generic piece yields zero plus the generic result, 
and so cannot be identified as non-generic by straightforward application of the renormalization 
transformation. 

The number of polynomials of iV variables with degree £ is 2^k=i Ni /(.V-(N-&) [26], which when 
£ <C N can be approximated as 2 e ( N /^ . Therefore, when £ scales as a fractional power of N, 
there are many more polynomials of degree £ than there are functions in P. On the other hand, 
the product of all N variables x\ . . . is in P, so there are functions in P that cannot be written 
as polynomials of degree £ for any £ < N. Therefore, using our definition of a phase based on the 
behavior yielded by repeated renormalization, P is not a phase. There are non-generic functions 
that are not in P and there are functions in P that are in the generic phase. However, note that a 
product of M variables is nonzero for only a fraction 2~ M of the input configurations. For example, 
the term x\X2 ■ ■ ■ xr is nonzero only for input configurations that have x\ = X2 = ■ ■ ■ = xr = 1. 
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The sum of a polynomially large number M of terms of this type is nonzero only on a fraction of 
inputs that is bounded above by M/2 R . In Appendix A it is argued that the functions in P that 
are in the generic phase have the property that for any £ < N, any Boolean function of N variables 
f(x\, . . . , xjv) that is in P can be written as the sum: 

f(xi, . . -,x N ) = Vs(xi, . . .,x N ) © U^(xi, . . .,x N ) , (10) 

where V^(x±, . . . ,xn) is a polynomial of degree no more than £ and the remainder term TZ^(xi, . . . , xn) 
is nonzero on a fraction of input configurations that is bounded above by C2~ a ^^° S2 ^ N \ with C and 
a positive constants. 

As discussed above, using the RG transformation to identify functions that satisfy Eq. (10) is not 
entirely straightforward — the obvious strategy, seeing if the functions obtained after renormalizing 
£ + 1 times have a small remainder term, fails because renormalization yields exponential growth 
in the fraction of input configurations for which the remainder term is nonzero. This difficulty can 
be circumvented by examining all functions that differ from the function in question on a fraction 
of input configurations no greater than C2~ a ^ log2 ^ N \ If the original function obeys Eq. (10), 
then one of the "perturbed" functions will have a remainder term that is zero, and applying the 
renormalization transformation to it £ + 1 times yields zero for all choices of the decimated variables. 

There are functions known to be in P that can written as the sum of a function of a composite 
variable plus a function that is nonzero on a small fraction of inputs. Nongeneric behavior is 
obtained upon renormalization for all such functions except for those for which all functions in 
the renormalization sequence yield one for exactly half the input configurations. The procedure for 
identifying such functions is exactly analogous as for identifying functions that can be approximated 
as low-order polynomials — examine the properties under renormalization of all the functions that 
are yield the same output as the one in question except for a small fraction of the inputs. 

Finally, we note that in Appendix B it is demonstrated that almost all generic random functions 
do not satisfy Eq. (10) when £ scales as a fractional power of N. 



4 Discussion 

This paper presents a renormalization group approach that distinguishes generic Boolean functions 
of N variables from functions that can be written as a polynomial of degree £, with £ <C N, and 
also from functions that depend only on composite quantities such as the arithmetic sum of all 
the input variables. The method provides a consistent framework for identifying many different 
functions as non-generic. 

The renormalization group approach also provides a natural framework for understanding why 
the P versus NP question is so difficult. Functions computable with polynomial resources do not 
comprise a phase — there are functions that are in a non-generic phase that are not in P, and there 
are functions in P for which the renormalization group yields a "flow" that is towards the generic 
fixed point and hence are in the "generic" phase. The possibility of using the RG approach to 
demonstrate that a given Boolean function is not in P arises because it is possible that all functions 
in P that are in the generic phase are all close to a phase boundary of a non-generic phase. Whether 
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the renormalization group approach can provide a means for determining whether or P is distinct 
from NP depends on whether it is possible to demonstrate that all efficiently computable functions 
are in or near a non-generic phase. 

The procedure used here of using the behavior yielded by a renormalization group transformation 
to identify different phases of Boolean functions is entirely analogous to a procedure presented by 
Wilson [13] to identify different thermodynamic phases of the Ising model, used to describe mag- 
netism in solids. Wilson showed that individual configurations of Ising models could be identified 
as being in either a ferromagnetic phase or paramagnetic phase by repeatedly eliminating spins and 
examining the resulting configurations — if after many renormalizations all the spins are aligned, 
then the system is in the ferromagnetic phase, while if after many renormalizations the spin orienta- 
tions are random, then the system is in the paramagnetic phase. Viewing the analogy between the 
results for magnets and the qualitatively different behavior of the renormalization group "flows" 
for polynomials of degree £, for functions of composite variables, and for generic Boolean functions 
as an indication that low-degree polynomials and functions of composite variables are both non- 
generic "phases," we propose the schematic phase diagram for Boolean functions, shown in Fig. 1. 



Figure 1: Schematic phase diagram for Boolean functions. Within the set of all Boolean functions 
of N Boolean variables there is a generic phase, a phase consisting of functions that can be written 
as polynomials of order no greater than £ with £ <C N, and there are phases corresponding to 
functions of composite variables such as the arithmetic sum of all the inputs. Some polynomials of 
degree £ are not in P, and some functions that can be computed with polynomial resources cannot 
be written either as polynomials of degree £ for any £ < N or as functions of a composite variable. 
Therefore, P does not denote a phase. However, we conjecture that that all functions in P are 
either in a non-generic phase or else very close to the low-order-polynomial phase boundary. 



If it can be shown that all functions in P are either in a non-generic phase or else very close to a 
phase boundary, then the procedure described here leads to a specific algorithmic approach to the 
P versus NP question — if a given function that is obtained as the answer to a problem in NP fails 
to be close enough to a non-generic phase, then one has shown that P is not equal to NP. (Ref. [29] 
advocates a family of candidate functions for testing using the strategy proposed in this paper, 
but the strategy can be implemented for any candidate function.) Appendix B shows that almost 
all Boolean functions are not close to non-generic phase boundaries. Appendix A argues that the 
construction of a function in P that does not satisfy Eq. (10) requires delicate balancing that may 



All Boolean functions 



Polynomials of degree \ (1 « t, « N) 



Functions in P 
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signal the existence of a composite variable, but the argument is only speculative. Progress on this 
issue is the key to using the RG approach to be able to address the P versus NP question. 

Because the procedure discussed in Sec. 3 requires a number of operations that scales superexpo- 
nentially with N, the procedure proposed here is not a "natural proof as discussed in Ref. [21] and 
therefore does not yield a method for breaking pseudorandom number generators. However, direct 
numerical implementation of the procedure is not likely to be computationally feasible. 



5 Conclusions 



This paper presents a renormalization group approach that can be used to distinguish a generic 
Boolean function from (1) a Boolean function of N variables that can be written as a polynomial 
of degree £ with £ < N, and (2) a function that depends only on a composite variable (such as 
the arithmetic sum of the inputs). An algorithm for determining whether a function differs from a 
polynomial of degree £ on a fraction of inputs that is exponentially small in £/log(iV) is presented. 
The possible relevance of these results to the question of whether P and NP are distinct is discussed. 
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Appendix A Characterization of the functions that can be con- 
structed with a polynomially large number of operations. 

In this appendix we examine the properties of functions that can be computed with polynomially 
bounded resources. First we discuss why it is plausible that almost all functions in P can written in 
the form Eq. (10), which is the sum of two terms, the first a polynomial of degree £, and the second 
a correction term that is nonzero on a fraction of input configurations that is exponentially small 
in £/log(iV). We then examine known functions in P that cannot be written in this form, arguing 
that they have special properties that may give rise to the emergence of a composite variable on 
which the function depends, which would lead to non-generic behavior upon renormalization. 

To see why it is hard to construct functions in P that do not satisfy Eq. (10), we consider the 
process by which functions can be constructed. First we show that a starting polynomial that is 
the sum of polynomially many terms whose factors are all either Xi or (1 — xi) satisfies Eq. (10). 
Then we show that the sum of two functions that each obey Eq. (10) also satisfies Eq. (10), and also 
that the coefficient multiplying the correction term grows sufficiently slowly that the bound remains 
true even after a number of additions that grows polynomially with N . We then consider products 
of such functions. The behavior is more complicated, but we argue that a similar decomposition 
works in most circumstances because when many terms are multiplied together, the result is nonzero 
only on a small fraction of inputs. Finally, we examine some functions in P which do not satisfy 
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Eq. (10) and note that they involve a delicate balance that enables the sum of a finite number 
of products to be nonzero on the same fraction of inputs as the individual terms. It is plausible 
that this nongeneric property is associated with the nongeneric behavior of these functions upon 
renormalization. 

First consider a polynomial A(x±, . . . ,xn) that is the mod-2 sum of polynomially many terms that 
are all of the form y^ . . . j/j m , where yi is either X{ or 1 — xf. 

N M v 

A(X 1: ...,X N ) =C + J2Y^ yii(r h k v )---yi v (ri,k v ) ■ (n) 
f]=\ k v =l 

Here, Co is a constant, i] denotes the number of factors of yi in a term, k v is the index labeling the 
different terms with r) factors, ij(rj, k v ) denotes the index of the j th factor in the term k v , and each 
M v , the number of terms with 77 factors, is bounded above by a polynomial of N. We will obtain 
bounds on the number of configurations for which the output is nonzero by considering standard 
addition instead of modulo-two addition, which means that we will overcounting by including 
configurations for which an even number of terms in the polynomial expansion are nonzero. Each 
term with r] factors is nonzero only on a fraction 2~' n of the inputs. Therefore, if we define Pa{v) 
to be the fraction of inputs of A(x±, . . . , x/v) for which the sum of all the terms with r\ factors is 
nonzero, we have 

Pa(v) < C A 2~ ari , (12) 
for constant Ca and a = \ — e, with e infinitesimal. 

Now consider the addition of two functions P(x±, . . . ,xn) and Q(x±, . . . ,xn) that satisfy Eq. (10) 
for positive C-p, Cq, and a. Again we consider standard addition instead of modulo-two addition. 
Because the sum S(xi, . . . ,xn) = P{x\, ■ ■ ■ ,xn) + Q(xi, ■ ■ ■ ,xn) has the property that all terms 
in the sum appears in at least one of the summands, we have 

ps(v) < pp(v) + pq(v) ; (13) 

the sum obeys Eq. (12) with the same value of a and with Cs < Cp + Cq. Adding polynomially 
many terms can increase the prefactor only by an amount that grows no faster than polynomially 
in N. 

We next consider the product of two functions that satisfy Eq. (12). We write 

A(x) = P|(x) + 4(f) 

B(x) = Pi(x) + R%(x), (14) 

where P_| and P| are polynomials of order £ with T4 and Tb terms respectively, and R^ A (x) and 
R B (x) are both nonzero on a fraction of inputs that is less than C2~ a ^ for positive constants C and 
a. 

We write the product of A(x) and B(x) as 

D(x) = A{x)B{x) 

= (P|(x) + 4(x))(P|(x) + 4(x)) 

= Pi(x)Pi(x) + Pj(f)4(x) + R A (x)P B (x) + Pi(*)4(z) • (15) 
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Now P A (x)R B (x) is nonzero on fewer inputs than R B (x) (this follows since a product is nonzero 
only if each of its factors is nonzero), and, similarly, R A (x)P B (x) is nonzero on fewer inputs than 
either R A (x) or R A (x), so the sum of the last three terms must be less than 3C2~ a ^. Therefore, these 
contributions to the remainder term in the product remain exponentially small, with a coefficient 
that remains bounded by a polynomial in N after polynomially many multiplications. Therefore, 
it only remains to consider the properties of the product P A (x)P B (x), which we write 

Pi(x)P B (x) = P^(x) + R%(x) , (16) 

where P^>{x) is a polynomial of degree £ and R^ D {x) is a remainder term that we need to bound. 

To bound the magnitude of the remainder, let us multiply out the polynomials in Eq. (16) so that 
they are all sums of terms that are products of the form . . . yi j , terms that we will denote as 
"primitive." Let Ta be the number of primitive terms in P A (x), and Tg be the number of primitive 
terms in P| (x) . Note that every primitive term in the product with more than £ factors is nonzero 
on a fraction or less of the input configurations. 

Since the total number of primitive terms in R^ D (x) is bounded above by TaTb, the fraction of 
inputs on which the sum of the terms with at least £ factors is nonzero is bounded above by 
TaTb2~^. So long as Ta and Tg are both less than exponentially large in £, then this remainder 
term is exponentially small in £. The multiplication process must start with values of Ta and Tg 
that are both bounded by a polynomial of N, but because multiplications can be composed, we 
need to examine the behavior of To, the number of primitive terms in P^(x). 

A simple upper bound for Tg> is obtained by ignoring all possible simplifications that could reduce 
the total number of terms in the product: 

T D < T A T B • (17) 

This equation describes geometric growth. If M polynomials are multiplied together, all of which 
have fewer than CN Y terms for fixed C and Y , then the total number of terms in the product, 
Tjw, satisfies the bound 

T M < (CN Y ) M . (18) 
This bound on the number of terms in the product is much smaller than 2^ so long as M satisfies 

M<.£/(Y log 2 N + log 2 C) . (19) 

A useful bound on multiplicative terms that are products of more than £/(Y log 2 N) factors can be 
obtained by exploiting the fact that the product of two functions is nonzero for a given input only 
if each of the factors is. Specifically, consider the product AB, and say that A is nonzero on a set of 
Ma inputs. If B is nonzero on less than a fraction a of the inputs in this set for some 1/2 < a < 1, 
then the product AB is nonzero on fewer than uMa inputs, and if not, then the product A(l — B) 
is nonzero on fewer than (1 — o)Ma inputs, and one can write AB = A + A(l — B). [30] 

The result of M multiplications is then nonzero only on a fraction of inputs bounded above by 
2~M\og 2 a Therefore, a product of more than £/Y \og 2 (N) factors is nonzero on no more than a 
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fraction 2 c €/ l °S2( N ) f the inputs, where C is a positive constant, and the entire product can be 
moved into the remainder term. 

The arguments above indicate that the remainder term tends to be small for products because the 
number of terms in the polynomial that are of order £ or less can be bounded for products of small 
numbers of terms, and products of many terms are nonzero on a small enough fraction of the input 
configurations that they can be considered to be part of the remainder term. However, there are 
functions in P that do not obey Eq. (10). Two examples of functions that are in P that have been 
proven to violate Eq. (10) are MAJORITY (which is one when more than half input variables have 
been set to one and zero otherwise) [18] and DIVISIBILITY MOD p, which is one if the sum of 
the input variables is divisible by an odd prime p [19,20]. Both these functions depend only on 
the arithmetic (not mod-2) sum of all the variables, x\ + %2 + • • • xn- Calculating the sum of N 
variables can be done with polynomially bounded resources because one need only keep track of a 
running sum, which is the same for many different values of the individual Xj. For instance, when 
k = N, there are N\/((N/2)\) 2 rs 2 n /V2ttN different ways to choose the x\...Xk so that their 
sum is N/2. 

It is instructive to consider an algorithm for computing DIVISIBILITY MOD 3 to see how the 
function avoids being a low order polynomial. Some pseudocode for a simple algorithm for this 
problem is: 

divisibility mod 3 : 

start : remainderO[0] = 1, remainder 1[0] = remainder2[0] = 
for each i > 

remainderO[i + 1] = remainderO[i] * (1 — x; + i) © remainder2[i] * Xj+i 
remainderl[i + 1] = remainderl[i] * (1 — x; + i) © remainderOfi] * Xi+i 
remainder2[i + 1] = remainder2 [i] * (1 — x; + i) © remainder l[i] * Xi + i 
answer = remainderO[N] 

The quantity remainder0[i]+remainderl[i]+remainder2[i] is unity for every i, and the fraction of 
inputs for which each remainder variable is nonzero is very close to 1/3 and does not decay ex- 
ponentially with i. The fractions do not decay or grow because the equation for each remainder 
for a given i is the sum of two products. The product remainder [i] ( 1 — xi+i) is nonzero on half 
the inputs on which remainderOfi] is nonzero, and similarly for the other term remainder2 [i] * x; + i. 
Because remainderOfi + 1] is the sum of two terms, each of which is nonzero on almost exactly half 
the outputs for which remainderO[i] is nonzero, remainderO[j] remains of order of but less than unity 
for all j. It is plausible that this exquisite cancellation leads to the existence of a composite variable 
on which the function depends, or, more generally, to non-generic behavior upon renormalization. 
Because obtaining a function that cannot be written as a low-order polynomial plus a term that is 
nonzero except for a small fraction of input configurations requires a series of delicate cancellations, 
it is also extremely plausible that the fraction of functions that are in P and do not satisfy Eq. (10) 
is extremely small. 

As discussed in the main text, the renormalization group distinguishes functions that depend on the 
sum of the values of the input variables from generic Boolean functions because the renormalization 
transformation preserves the property that a given value for the composite variable occurs for an 
exponentially large numbers of input configurations. Moreover, at least when the composite variable 
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is the arithmetic sum of the inputs, the fractions of input configurations for which the sum of the 
variables takes on different values differ by an amount that decays only polynomially with N. 
Therefore, such functions can yield one either on exactly half the inputs or else on a fraction of the 
inputs that differs from 1/2 by an amount that is at least as large of N~ x for some positive x. 

To summarize, in this appendix we discuss the restrictions on Boolean functions of N variables that 
can be computed with resources that are bounded above by a polynomial in N. Many functions 
in P have the property that they can be written, for any fixed £, as the sum of a polynomial of 
degree £ and a term that is bounded above by C2~ a ^/ Xog ^ N " > for positive constants C and a. Known 
functions in P that cannot be approximated by low-order polynomials have the property that they 
have a dependence on a composite variable. The renormalization group transformation provides a 
means for distinguishing both types of functions from generic Boolean functions. 



Appendix B: Demonstration that a typical Boolean function does 
not satisfy Eq. (10). 

In this appendix it is shown that for a typical Boolean function, changing the outputs for an 
exponentially small fraction of the inputs does not yield a low-order polynomial. Specifically, given 
a value of £ with £ oc N y with < y < 1, if one changes the output value of a typical Boolean 
function for no more than C2 N ~ a ^l l °S2( N ) input configurations, then the resulting function cannot 
be written as a polynomial of degree £ or less. This is done by showing that the number of Boolean 
functions that satisfy Eq. (10) is much less than the number of Boolean functions of N variables. 

The number of Boolean functions of N variables satisfying Eq. (10), B(N,£), satisfies 

B(N,t)<HN,Z)M(N,0 , (20) 

where J-(N,£) denotes the number ways to choose up to C2 N ~ a ^^ og2 ^ N ^ input configurations and 
M(N,£) is the number of polynomials of degree £. 

Let $ = C2 7V -</ 1 °S2W be the maximum number of configurations whose outputs we are allowed 
to alter, and O = 2 N be the total number of input configurations. The quantity J-(N, £) is the 
number of ways that one can choose up to $ items out of O possibilities. We have 



~ e(n/s) s = e(2 a ^ lo ^W/C) , (21) 

where the last line applies when 1 <C £ <C N. Next note that Ai^, the number of different 
polynomials of degree less than or equal to £, is: 

~ 2 e W« € , (22) 

where again the last line assumes 1 <C £ <C N. Eq. (22) follows because all polynomials of degree £ 
or less can be written as a sum over all terms that are products of the form x% x . . . Xi . with j < £. 
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There are Ylj=i N\/\j\(N — such terms, and each coefficient can be either 1 or 0. Thus, when 
1 <C £ <C N, the total number of functions that satisfy Eq. (10) is bounded above by 

B(N,0 < ( e ( 2 </ lo s 2 W/C) C2JV ^ /loS2W ) (2 e W« C ) , (23) 

which, as N — > oo and £ oc A^ a with < a <1, is much smaller than 2 , the total number of 
Boolean functions of N Boolean variables. 

A second non-rigorous but informative argument to see that generic Boolean functions do not 
satisfy Eq. (10) is to consider a generic Boolean function in which each coemcient i N is 

chosen independently and randomly to be 1 or with equal probability. For a typical Boolean 
function, one can always find a configuration satisfying Eq. (10) by changing just about half the 
output values so that the function has the same value for all inputs. The question is whether 
one can obtain g Xil ,...,x iM (x') = for all choices of the M decimated variables by changing the 
function for many fewer configurations than that. For a given g in which M variables have been 
decimated, one can find a configuration satisfying g Xil ,...,x iM (%') = for the 2 N ~ M different possible 
x' by changing the output for just about 2 N ~~ M ~ 1 different input configurations. But one must 
arrange for g Xjl ,...,xj M (%') to vanish for all possible choices of the M variables to be decimated. 
There are N\/[M\(N — M)\] ~ e(N/M) M different ways to choose the decimated variables, so a 
naive estimate is that one must adjust 2 N ~ M configurations for each of e(N/M) M choices of the 
decimated variables, or 2 N - M+1+Mlo ^ N / M \ which exceeds 2 N for all M < N. This argument is 
useful because it makes it clear why one must examine all choices of the decimated variables to 
distinguish functions that do not satisfy Eq. (10). 
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